Understanding generalization error of SGD in nonconvex optimization
نویسندگان
چکیده
The success of deep learning has led to a rising interest in the generalization property stochastic gradient descent (SGD) method, and stability is one popular approach study it. Existing bounds based on do not incorporate interplay between optimization SGD underlying data distribution, hence cannot even capture effect randomized labels performance. In this paper, we establish error for by characterizing corresponding terms on-average variance gradients. Such characterizations lead improved experimentally explain random We also regularized risk minimization problem with strongly convex regularizers, obtain proximal SGD.
منابع مشابه
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds...
متن کاملTheory of Deep Learning III: Generalization Properties of SGD
In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...
متن کاملNonconvex Optimization Is Combinatorial Optimization
Difficult nonconvex optimization problems contain a combinatorial number of local optima, making them extremely challenging for modern solvers. We present a novel nonconvex optimization algorithm that explicitly finds and exploits local structure in the objective function in order to decompose it into subproblems, exponentially reducing the size of the search space. Our algorithm’s use of decom...
متن کاملImproving Generalization Performance by Switching from Adam to SGD
Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switc...
متن کاملNonconvex Robust Optimization
We propose a novel robust optimization technique, which is applicable to nonconvex and simulation-based problems. Robust optimization finds decisions with the best worst-case performance under uncertainty. If constraints are present, decisions should also be feasible under perturbations. In the real-world, many problems are nonconvex and involve computer-based simulations. In these applications...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2021
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-021-06056-w